CSGO-Logo.png

Analyzing Playstyles and Mechanics of Counter Strike

Alec Pool and Matt Durkin

Background: Counter Strike: Global Offensive is a video game in which teams of 5 compete to win matches that consist of 30 individual rounds. The players are split into two teams of “Terrorist” and “Counter-Terrorist”, in which the Terrorist team can win by detonating a bomb at one of two ‘sites’ or eliminating all of the other team, and Counter-Terrorists can win by defusing the terrorists’ bomb or eliminating all of the other team. Games are split into two halves of 15 rounds each, and each team spends one half as the Terrorists and one half as the Counter-Terrorists.

At the end of each round, each player is rewarded a certain amount of in-game money based on their actions in the round (kills, planting the bomb, etc), the winning team of the round, and the results of the preceding rounds (for example: losing your fifth round in a row will grant the losing team \$3400 per player, where as losing only one round will grant each player just \$1400). This money is then used in the next round to purchase equipment (weapons, grenades, armor, etc). As the game progresses, teams can afford stronger weapons and more grenades/armor. Each teams buying strategies at the beginning of a round are categorized as follows:

-Pistol Round - The first round of each half where each team has only $800 per player and can not afford to buy anything stronger than a pistol or grenades.

-Eco Round/Semi-Eco Round - When one team does not have enough money for a full buy and chooses to save most or all of their cash in the hopes of going into the following rounds with a better economy in which they can buy full loadouts.

-Force Buy - When one team does not have enough money for a full buy and chooses to spend it all on whatever they can afford. This will usually consist of them buying only the cheaper, less powerful rifles or SMGs.

-Full Buy/Normal Buy - When a team has enough money to afford the ‘best’ weapons and does not go out of their way to save for future rounds.

The weapons of Counter-Strike are split into 5 categories: Heavy (Shotguns and LMGs), Rifles, Snipers, Pistols, and SMGs. When players can afford to do so, they will usually buy rifles (AK47 or M4) and snipers (AWP).

More info on Counter Strike competitive mechanics

This analysis will look at some trends within Counter Strike matches as well as analyze the effects of using different weapons at different ranges and how the in-game economy and buying patterns influence the outcome of rounds/games.

Getting Started

In implemented this data analysis, we used Python 3 with some important libraries such as pandas (for managing data), numpy (for working with numbers), matplotlib (for representing data visually), and various others.

In [116]:
#++MODULES

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.misc import imread
import warnings
warnings.filterwarnings('ignore')
import random
import sklearn
from sklearn import linear_model
import math
from statistics import mean
import seaborn as sns
import statsmodels.api as sm
from statsmodels.formula.api import ols
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier     
from sklearn.model_selection import KFold, cross_val_score

Tidying the Data

For this project we are working with Kevin Pei's CS:GO Competitive Matchmaking Data database.

Data Set

Kevin Pei

This data set, alongside other things, contains records of over 1000 public matches of various skill levels showing second-by-second recordings of different games with numbers on damage dealt, player positions, grenades thrown, bombs planted, etc. It also contains data on various maps of Counter Strike with their dimensions and rendeings of these maps from a birds' eye view. This will be useful in visualizing positional data. Specifically, we will be using map_data.csv, mm_grenades_demos.csv, mm_master_demos.csv, and the .png map files.

Firstly, we need to process our data into our local environment. To do this, we'll use pandas to read the data from the csv files and store it as dataframes. This will give us our data in an easily accessable form to aid in our future manipulation and analysis. We will be using pandas throughout this project. For more info, visit pandas documentation.

In [117]:
df = pd.read_csv("data/CSVs/mm_master_demos.csv")

mapdf = pd.read_csv("data/CSVs/map_data.csv")

grenadedf = pd.read_csv("data/CSVs/mm_grenades_demos.csv")

After processing the data, we need to tidy it up by renaming certain columns to be more easily interpretable and dropping other columns that we won't need for future analysis. While tracking variables like armor damage or which site the bomb is planted at could be useful in other projects using this dataset, it is unnecessary for our purposes and simply takes up space.

In [118]:
#rename columns for clarity and drop unnecessary data
df.rename(columns = {'file': 'match_no'}, inplace = 'true')
df.drop(columns=['date', 'tick', 'award', 'vic_side', 'arm_dmg', 'is_bomb_planted', 'bomb_site', 'award', 'att_rank'])

#rename columns for clarity and drop unnecessary data
mapdf.drop(columns = ['ResX', 'ResY'])
mapdf.rename(columns = {'Unnamed: 0': 'map'}, inplace=True)

#rename columns for clarity and drop unnecessary data
grenadedf.drop(columns = ['hitbox', 'ct_eq_val', 't_eq_val', 'is_bomb_planted', 'arm_dmg', 'hp_dmg', 'winner_team', 'winner_side', 'att_rank', 'vic_rank', 'vic_pos_x', 'vic_pos_y', 'round_type', 'ct_eq_val', 't_eq_val', 'round', 'start_seconds', 'vic_id', 'vic_side', 'bomb_site', 'att_team', 'vic_team', 'end_seconds', 'att_id', 'seconds'], inplace = True)

Next, we implement a new column that tracks whether the team with more expensive equipment in a certain round is the winner of that round. This will be useful in analysis later.

In [119]:
#check to see whether the winning team of each 
#round was also the team who spent the most $ 
#save to new column
df['higher_wins'] = False
for index, row in df.iterrows():
  if (((row['winner_side'] == 'CounterTerrorist') and (row['ct_eq_val'] > row['t_eq_val'])) or ((row['winner_side'] == 'Terrorist') and (row['t_eq_val'] > row['ct_eq_val']))):
    df.at[index, 'higher_wins'] = True

Exploratory Data Analysis

To begin with our analysis, we try to uncover certain trends within the data that could reveal insights into the behaviors of players within Counter Strike. Thanks to the comprehensive map and player positioning data, we can visualize some of our data in a way that is very easily interpretable.

Postioning Analysis

For our first positional analysis, we will simply visualize all damage dealt and received over the course of one random match from our dataset. The blue circles will represent someone dealing damage, and the red circles will represent someone recieving damage. Below is a representation of that on the fan-favorite map, Dust 2.

To do this, we will first load the background map as provided by the dataset. Next, we will pick a random game from the dataframe and create a new dataframe that contains only the incidents of damage from that match. Next we will use matplotlib to create a scatter plot from the position information included in our new dataframe and the map-specific coordinate boundaries from the map dataset.

We will be using matplotlib.pyplot throughout this project for visual representations of data. matplotlib.pyplot documentation

In [120]:
map_input = "de_dust2"
bg = imread(f'data/maps/{map_input}.png')

#select random game from dust2 and create matching dataframe
match = random.choice(list(df.loc[(df.map == map_input)].groupby('match_no').groups.keys()))
plot_df = df.loc[(df.match_no == match)] 

#find map coordinate data from map dataframe
map_data = mapdf.loc[mapdf.map == map_input]
coords = [map_data.StartX.sum(), map_data.EndX.sum(), map_data.StartY.sum(), map_data.EndY.sum()]

#create matplotplib graph and overlay over background image
plt.figure(figsize=(15, 15))
plt.imshow(bg, aspect = 'equal', interpolation = 'none', extent = coords)
plt.scatter(plot_df['att_pos_x'] + 25, plot_df['att_pos_y'], alpha=.5, c='blue')
plt.scatter(plot_df['vic_pos_x'] + 25, plot_df['vic_pos_y'], alpha=.5, c='red')
plt.title(f'Damage Dealt', fontsize=30)
plt.axis('off')
plt.show()

From this image, we can get an idea of where players are most often congregating and dealing/receiving damage to one another. At the top and bottom centers, we can see where the Terrorist spawn (bottom in green) has a line of sight across the middle of the map and through a doorway next to Counter-Terrorist spawn. This is a common area for enemy engagement at the beggingings of rounds, and that is reflected in the data. We also can see that bomb sitses and choke points between the two teams' spawn have the highest density of player interaction, whereas areas like the bottom left have very little data, as these areas do not often see any fighting.

Next, we can look at a more generalized representation of player movements and interactions in a specific map. In this case, we are looking at damage dealt (represented in red) by Terrorists (represented in blue) in the map Cache over the course of hundreds of matches (237 to be exact).

In [121]:
map_input = "de_cache"
bg = imread(f'data/maps/{map_input}.png')

#select all games from Cache and create matching dataframe
plot_df = df.loc[(df.map == map_input) & (df.att_side == 'Terrorist')] 

#find map coordinate data from map dataframe
map_data = mapdf.loc[mapdf.map == map_input]
coords = [map_data.StartX.sum(), map_data.EndX.sum(), map_data.StartY.sum(), map_data.EndY.sum()]

#create matplotplib graph and overlay over background image
plt.figure(figsize=(15, 15))
plt.imshow(bg, aspect = 'equal', interpolation = 'none', extent = coords)
plt.scatter(plot_df['att_pos_x'] + 25, plot_df['att_pos_y'], alpha=.01, c='blue')
plt.scatter(plot_df['vic_pos_x'] + 25, plot_df['vic_pos_y'], alpha=.01, c='red')
plt.title(f'Damage Dealt by Terrorists', fontsize=30)
plt.axis('off')
plt.show()

From this, we can see clear trends in where the terrorists and counter-terrorists are attacking each other. The distribution of terrorists on the right and counter-terrorists on the left (where they spawn, respectively) is very clear, and commong Counter-Terrorist defensive positions appear as large red masses, as this is where they are being attacked by terrorists.

We can now look again at the exact same data, but this time restricted to only players using the 'AWP', a powerful sniper.

In [122]:
map_input = "de_cache"
bg = imread(f'data/maps/{map_input}.png')

#select all games from cache with AWP and create matching dataframe
plot_df = df.loc[(df.map == map_input) & (df.att_side == 'Terrorist') & (df.wp == 'AWP')] 

#find map coordinate data from map dataframe
map_data = mapdf.loc[mapdf.map == map_input]
coords = [map_data.StartX.sum(), map_data.EndX.sum(), map_data.StartY.sum(), map_data.EndY.sum()]

#create matplotplib graph and overlay over background image
plt.figure(figsize=(15, 15))
plt.imshow(bg, aspect = 'equal', interpolation = 'none', extent = coords)
plt.scatter(plot_df['att_pos_x'] + 25, plot_df['att_pos_y'], alpha=.1, c='blue')
plt.scatter(plot_df['vic_pos_x'] + 25, plot_df['vic_pos_y'], alpha=.1, c='red')
plt.title(f'Damage Dealt by Terrorists with AWP', fontsize=30)
plt.axis('off')
plt.show()

We can see that the range of engagement is much longer here than in our previous map, with the distance between clusters of players dealing/receiving damage being much greater. Similar yet opposite trends can be seen when selecting for short range weapons like shotguns.

Next, we can use a similar positional analysis technique to analyze how players use their grenades within a match, specifically how they use smoke grenades. Smoke grenades provide an opaque wall of smoke that Terrorists will use to block Counter-Terrorist sightlines when attacking and Counter-Terrorists will use to prevent their enemies from moving into a bombsite. The thrower of each smoke grenade is represented in red, while the landing spot of each smoke grenade is represented in blue.

In [123]:
map_input = "de_mirage"

#create plot dataframe
plot_df = grenadedf.loc[(grenadedf.nade == 'Smoke') & (grenadedf.map == map_input) & (grenadedf.att_side == 'Terrorist')] 
bg = imread(f'data/maps/{map_input}.png')

#fetch map data
map_data = mapdf.loc[mapdf.map == map_input]
coords = [map_data.StartX.sum(), map_data.EndX.sum(), map_data.StartY.sum(), map_data.EndY.sum()]

#create plot and display background
plt.figure(figsize=(15, 15))
plt.imshow(bg, aspect = 'equal', interpolation = 'none', extent = coords)
plt.scatter(plot_df['att_pos_x'] + 25, plot_df['att_pos_y'], alpha=.1, c='blue')
plt.scatter(plot_df['nade_land_x'] + 25, plot_df['nade_land_y'], alpha=.11, c='red')
plt.title(f'Terrorist Smoke Grenades', fontsize=20)
plt.axis('off')
plt.show()

#create plot dataframe
plot_df = grenadedf.loc[(grenadedf.nade == 'Smoke') & (grenadedf.map == map_input) & (grenadedf.att_side == 'CounterTerrorist')] 
bg = imread(f'data/maps/{map_input}.png')

#fetch map data
map_data = mapdf.loc[mapdf.map == map_input]
coords = [map_data.StartX.sum(), map_data.EndX.sum(), map_data.StartY.sum(), map_data.EndY.sum()]

#create plot and display background
plt.figure(figsize=(15, 15))
plt.imshow(bg, aspect = 'equal', interpolation = 'none', extent = coords)
plt.scatter(plot_df['att_pos_x'] + 25, plot_df['att_pos_y'], alpha=.1, c='blue')
plt.scatter(plot_df['nade_land_x'] + 25, plot_df['nade_land_y'], alpha=.11, c='red')
plt.title(f'Counter-Terrorist Smoke Grenades', fontsize=20)
plt.axis('off')
plt.show()

As we can see, the terrorists (right side spawn) have a tendency their grenades from between their spawn A site (bottom center) and cut off the line of sight between their attack position and Counter-Terrorist spawn. A similar trend can be seen at site B (top left), though less well defined.

The Counter Terrorists, on the other hand, tend to throw the majority of their smokes at the choke points where the Terrorists pass through while trying to reach the bomb sites.

While the clusters are noticeable, it is also clear that the smoke grenades do not all land in their exact intended spot. We can select for only matches where the average player rank was above a certain threshold and see how that changes the outcome.

In [124]:
map_input = "de_mirage"

#create plot dataframe
plot_df = grenadedf.loc[(grenadedf.nade == 'Smoke') & (grenadedf.map == map_input) & (grenadedf.att_side == 'Terrorist') & (grenadedf.avg_match_rank > 15)] 
bg = imread(f'data/maps/{map_input}.png')

#fetch map data
map_data = mapdf.loc[mapdf.map == map_input]
coords = [map_data.StartX.sum(), map_data.EndX.sum(), map_data.StartY.sum(), map_data.EndY.sum()]

#create plot and display background
plt.figure(figsize=(15, 15))
plt.imshow(bg, aspect = 'equal', interpolation = 'none', extent = coords)
plt.scatter(plot_df['att_pos_x'] + 25, plot_df['att_pos_y'], alpha=.5, c='blue')
plt.scatter(plot_df['nade_land_x'] + 25, plot_df['nade_land_y'], alpha=.5, c='red')
plt.title(f'Terrorist Smoke Grenades', fontsize=20)
plt.axis('off')
plt.show()

#create plot dataframe
plot_df = grenadedf.loc[(grenadedf.nade == 'Smoke') & (grenadedf.map == map_input) & (grenadedf.att_side == 'CounterTerrorist') & (grenadedf.avg_match_rank > 15)] 
bg = imread(f'data/maps/{map_input}.png')

#fetch map data
map_data = mapdf.loc[mapdf.map == map_input]
coords = [map_data.StartX.sum(), map_data.EndX.sum(), map_data.StartY.sum(), map_data.EndY.sum()]

#create plot and display background
plt.figure(figsize=(15, 15))
plt.imshow(bg, aspect = 'equal', interpolation = 'none', extent = coords)
plt.scatter(plot_df['att_pos_x'] + 25, plot_df['att_pos_y'], alpha=.5, c='blue')
plt.scatter(plot_df['nade_land_x'] + 25, plot_df['nade_land_y'], alpha=.5, c='red')
plt.title(f'Counter-Terrorist Smoke Grenades', fontsize=20)
plt.axis('off')
plt.show()

Here we can see the same data as before, but with much less variance in the positions. It is clear that these higher-ranked players have more experience and are more consistent in throwing accurate smoke grenades. We can also see that these higher-rank players appear to not use smoke grenades when attacking B very often relative to the general populace. Presenting the data like this can be valuable in finding insight into strategies that you may not be aware of.

Analyzing the relation of economy to round win-rate

Next, we analyzed how a teams' spending habits predict whether or not they will win a round. Obviously, our hypothesis was that spending more money than the other team would result in increased chances of winning the round. To check this, we needed to group our data into individual rounds and look at whether the team who spent more in that round also won the round. To do this, we used our 'higher_wins' column that we created earlier. We then calculated the differences in team spending into increments of $1000 and found the winrate of the higher spending teams within these increments. Finally, we used the matplotlib data visualization library, seaborn, to plot this in a bar graph. seaborn documentation

In [125]:
grouped = df.groupby(['match_no', 'round'])

#group data into individual rounds
econdf = pd.DataFrame()
count = 0
for name, group in grouped:
  econdf = pd.concat([econdf, group.head(1)], ignore_index = True)
  count += 1
  if count > 25000:
    break

#create new dataframe from grouped data for plotting purposes
econdf = econdf[['ct_eq_val', 't_eq_val', 'higher_wins']]
for index, row in econdf.iterrows():
  econdf.at[index, 'diff'] = abs(row['ct_eq_val'] - row['t_eq_val'])

econ = econdf.groupby(['diff'])['higher_wins'].mean().to_frame()
for index, row in econgrouped.iterrows():
  econ.at[index, 'diffc'] = index

#take avg of winning proabibility in increments of $1000
problist = []
for i in range(0, 30000, 1000):
  avg = econ.loc[(econ['diffc'] > i) & (econ['diffc'] <= i + 1000)]
  problist.append(avg['higher_wins'].mean())
In [126]:
X = []
Y = []

#create x and y variables from earlier data
for i in range(0, 30):
    X.append(i)

for prob in problist:
    Y.append(prob)

#create plot
plt.figure(figsize=(12,6))
p = sns.barplot(x=X, y = Y, orient='v')
p.set_xlabel("Difference in money spent (in thousands of dollars)", fontsize = 15)
p.set_ylabel("Likelihood of winning round", fontsize = 15)
p.set_title('Probability of team with higher value equipment winning round', fontsize = 20)
Out[126]:
Text(0.5, 1.0, 'Probability of team with higher value equipment winning round')

As we can see, the likelihood of winning a round clearly correlates with how much more money a team spends than their opponents. While the probability of winning hovers around 50% when the difference is <\$5000, it increases to nearly 100% when the difference in spending approaches \$30,000. This just demonstrates how important the economy in Counter Strike is and how difficult it is to win a round against a full-buying team when you are on an eco or force buy round. While more analysis could be done of the economic impact on round wins, we would find the same trends.

Weapon Analysis

Next, we wanted to look at the statistics around different weapons' usage. Specifically, we wanted to analyze how effective weapons are at different ranges. For example, we predicted that a sniper would logically have more use at long ranges and a shotgun or SMG would have more use at close ranges.

First, we needed to calculate the average distance that each different weapon was used to kill an enemy. To do this, we would need to calculate when a player was killed by seeing when the sum of their damage reached 100 (Counter Strike has no health regeneration), then calculate their average distance from their killer over the course of receiving damage using the pythagorean theorem ($x^2 + y^2 = z^2$). We chose to use the average distance over the course of the engagement to account for enemies moving during the fight. Often, the engagement would start at a long distance, then the losing player would take cover and be pushed by the attacking player. If we only took the final distance, our data would likely be lower than is an accurate representation.

In [127]:
groups = df.groupby(['match_no', 'round', 'vic_id'])

netWeaponDistances = {}
netWeaponInstances = {}

netWeaponTypeDistances = {}
netWeaponTypeInstances = {}

#get distance with pythag
def distance(x1,y1,x2,y2):
     return math.sqrt( ((int(x1)-int(x2))**2)+((int(y1)-int(y2))**2) )

#find average distance for multiple hits
def meanDistance(AttackXCords, AttackYCords, VictimXCords, VictimYCords):
    totalDistance = 0.0
    totalHits = 0

    merged_list = tuple(zip(AttackXCords, AttackYCords,VictimXCords,VictimYCords))

    for tup in merged_list:
        x1 = tup[0]
        y1 = tup[1]
        x2 = tup[2]
        y2 = tup[3]

        totalDistance = totalDistance + distance(x1,y1,x2,y2)
        totalHits = totalHits + 1

    return totalDistance/totalHits

#roundGroups[hp_dmg].sum()

i = 0

for name, group in groups:

    totalDamage = group['hp_dmg'].sum()
    result = True
    weaponUsed = group['wp']

    weaponTypeUsed = group['wp_type']

    numIters = 0

    weaponType = ''
    
    for elt in weaponTypeUsed:
        if numIters == 0:
            weaponType = elt
        elif weaponType != elt:
            result = False
            break
        
        numIters = 1

    numIters = 0
    weapon = ''
    
    for elt in weaponUsed:
        if numIters == 0:
            weapon = elt
        elif weapon != elt:
            result = False
            break
        
        numIters = 1

    if weapon == 'Unknown' or weaponType == 'Equipment' or weaponType == 'Grenade':
        continue

    if (result):
        if totalDamage == 100:
            AX = group['att_pos_x']
            AY = group['att_pos_y']

            VX = group['vic_pos_x']
            VY = group['vic_pos_y']
          
            try:
                netWeaponDistances[weapon] = netWeaponDistances[weapon] + meanDistance(AX,AY,VX,VY)
            except KeyError:
                netWeaponDistances[weapon] = meanDistance(AX,AY,VX,VY)

            try:
                netWeaponInstances[weapon] = netWeaponInstances[weapon] + 1
            except KeyError:
                netWeaponInstances[weapon] = 1

            try:
                netWeaponTypeDistances[weaponType] = netWeaponTypeDistances[weaponType] + meanDistance(AX,AY,VX,VY)
            except KeyError:
                netWeaponTypeDistances[weaponType] = meanDistance(AX,AY,VX,VY)

            try:
                netWeaponTypeInstances[weaponType] = netWeaponTypeInstances[weaponType] + 1
            except KeyError:
                netWeaponTypeInstances[weaponType] = 1
        
    
    #i = i + 1
    if i >= 10:
        break

finalWeaponDistances = {}

for key in netWeaponDistances:
    finalWeaponDistances[key] = netWeaponDistances[key] / netWeaponInstances[key]

finalWeaponTypeDistances = {}

for key in netWeaponTypeDistances:
    finalWeaponTypeDistances[key] = netWeaponTypeDistances[key] / netWeaponTypeInstances[key]

sortedWeaponDistances = {k: v for k, v in sorted(finalWeaponDistances.items(), key=lambda item: item[1])}

sortedWeaponTypeDistances = {k: v for k, v in sorted(finalWeaponTypeDistances.items(), key=lambda item: item[1])}

#print(sortedWeaponDistances)

After calculating these distances, we wanted to plot them on a bar plot for each weapon. We did this once again using seaborn.

In [128]:
X = []
Y = []

for weapon in sortedWeaponDistances.keys():
    X.append(str(weapon))

for dist in sortedWeaponDistances.values():
    Y.append(float(dist))

plt.figure(figsize=(12,6))
plt.xticks(rotation = 80)
p = sns.barplot(x=X, y = Y, orient='v')
p.set_title('Average distance of kills by each weapon')
p.set_xlabel('Weapon')
p.set_ylabel('Distance (in Source engine units)')
Out[128]:
Text(0, 0.5, 'Distance (in Source engine units)')

As would be expected, different weapons have different average ranges of kills. Short range weapons like the Sawed-off Shotgun and Tec-9 (pistol) have extremely close effective ranges and snipers like the AWP, G3SG1, and Scout have the longest effective ranges.

Now is a good time to explain how distance works in CS:GO's Source Engine. 16 units of distance in Source Engine is equivalent to 1 foot in real world terms. By this, a sawed off has a real world equivalent effective range of ~14', whereas a scout has a range of ~85'. Source Engine dimensions

Next, we calculated the average weapon ranges among the five different weapon classes. Heavy(shotguns/LMGs), SMGs, Pistols, Rifles, and Snipers

In [129]:
X = []
Y = []

for weaponType in sortedWeaponTypeDistances.keys():
    X.append(str(weaponType))

for dist in sortedWeaponTypeDistances.values():
    Y.append(float(dist))

plt.figure(figsize=(12,6))
plt.xticks(rotation = 80)
p = sns.barplot(x=X, y = Y, orient='v')
p.set_ylabel('Distance in Source Engine units')
p.set_xlabel('Weapon Class')
p.set_title('Avergae distance of kills by weapon class')
Out[129]:
Text(0.5, 1.0, 'Avergae distance of kills by weapon class')

This data is once again consistent with what one would expect. The heavy weapons category is slightly misleading, as it contains both shotguns which have a close effective range and LMGs which have a far effective range.

Next, we analyzed different weapon matchups. Our hypothesis was that a matchup like a pistol vs sniper would have interesting results, where the sniper wins in long range engagements and the pistol wins in close range engagements.

In [138]:
def roundup(x):
    return int(math.ceil(x / 600.0)) * 600
    

roundGroups = df.groupby(['match_no', 'round'])


numRounds = 0

numRows = 0





weaponDistanceHealth = {} #  weapon -> list of (distance, health) tuples 





# map each attacker for each round with their weapon that round, then search the dict for their ID and if they are there' retrieve their gun
# go through each value of attacker, see if they killed the other person (2 people can show up as attackers, but just one of them will win)

weaponKills = {} # 'tuple' (<weapon of attacker used to kill>, <weapon of victim who died>) -> (number of times this happened, TOTAL DISTANCE) , we will eventually compare the inverse to find a win average and calculate mean distance of a win

weaponTypeKills = {} # 'tuple' (<weapon type of attacker used to kill>, <weapon of victim who died>) -> (number of times this happened, TOTAL DISTANCE) , we will eventually compare the inverse to find a win average and calculate mean distance of a win


rowCount = 0

#mappedEncounters = {}

# we need a mapping of attacker -> victims they attacked and also a mapping victims -> who they got attacked by


for name, round in roundGroups:
    roundDF = pd.DataFrame(round)

    # for each round, make a dict mapping player ID to their weapon that round
    playerWeapons = {}  # playerID -> weapon used this round

    # if the encounter resulted in 100 damage, then the enemy was killed, log it as a kill from one weapon vs another and also the distance the players were
    #weaponKills = {}  # weapon used to kill -> weapon of victim

    # have to keep track of each victim's health throughout the whole round. If victim health reaches <=0, then map the attacker's weapon to the victim's weapon if it exists.
    # If the victim doesn't yet exist in the mapping, then subtract the damage done from 100
    playerHealth = {}  # playerID -> health (0-100)


    playerWeaponClasses = {}

    #encounters = round.groupby(['att_id','vic_id'])



    for index, row in roundDF.iterrows():
        rowCount = rowCount + 1

        attacker = row['att_id']
        victim = row['vic_id']

        attackWeapon = row['wp']
        damage = row['hp_dmg']

        weaponClass = row['wp_type']


        AX = row['att_pos_x']
        AY = row['att_pos_y']

        VX = row['vic_pos_x']
        VY = row['vic_pos_y']
        
        # removes non-weapons from consideration
        if attackWeapon == 'Unknown' or weaponClass == 'Equipment' or weaponClass == 'Grenade':
            continue
        

        # map each player's weapon from the round
        playerWeapons[attacker] = attackWeapon
        playerWeaponClasses[attacker] = weaponClass


        # map each player's health
        try:
            playerHealth[victim] = playerHealth[victim] - damage
        except KeyError:
            playerHealth[victim] = 100 - damage

        if attacker not in playerHealth:
            playerHealth[attacker] = 100


        # track health of attacker
        '''
        if attackWeapon in weaponDistanceHealth:
            list_ = weaponDistanceHealth[attackWeapon]
            list_.append((distance(AX, AY, VX, VY),playerHealth[attacker]))
            weaponDistanceHealth[attackWeapon] = list_
        else:
            list_ = []
            list_.append((distance(AX, AY, VX, VY),playerHealth[attacker]))
            weaponDistanceHealth[attackWeapon] = list_
        '''
        
        
            


        # check if victim is dead
        if playerHealth[victim] <= 0:
            # check if victim's weapon is logged
            try:
                victimWeapon = playerWeapons[victim]
                victimWeaponClass = playerWeaponClasses[victim]




                try:
                    _distanceMAP = weaponTypeKills[(weaponClass, victimWeaponClass)] 
                    
                    dist_ = roundup(distance(AX, AY, VX, VY))

                    try:
                        currTup = _distanceMAP[dist_]
                        (_wins) = currTup
                        _distanceMAP[dist_] = (_wins + 1)
                        weaponTypeKills[(weaponClass, victimWeaponClass)] = _distanceMAP
                        
                    except KeyError:
                        _distanceMAP[dist_] = (1)
                        weaponTypeKills[(weaponClass, weaponClass)] = _distanceMAP
                    
                except KeyError:
                    weaponTypeKills[(weaponClass, victimWeaponClass)] = {roundup(distance(AX, AY, VX, VY)) : (1)}



                try:


                    # WEAPON CLASS TRACKING
                    try:
                        list_ = weaponDistanceHealth[weaponClass]
                        list_.append((distance(AX, AY, VX, VY),playerHealth[attacker]))
            
                        weaponDistanceHealth[weaponClass] = list_
                    except (KeyError, AttributeError):
                        list2_ = []
                        list2_.append((distance(AX, AY, VX, VY),playerHealth[attacker]))
                        weaponDistanceHealth[weaponClass] = list2_



                    
                    try:
                        list_ = weaponDistanceHealth[victimWeaponClass]
                        list_.append((distance(AX, AY, VX, VY),playerHealth[victim]))
            
                        weaponDistanceHealth[victimWeaponClass] = list_
                    except (KeyError, AttributeError):
                        list2_ = []
                        list2_.append((distance(AX, AY, VX, VY),playerHealth[victim]))
                        weaponDistanceHealth[victimWeaponClass] = list2_

                    
                    ###################################

                    # EXACT WEAPON TRACKING
                    (occurences, totalDistance) = weaponKills[(attackWeapon, victimWeapon)]

                    weaponKills[(attackWeapon, victimWeapon)] = (occurences + 1, totalDistance + distance(AX, AY, VX, VY))
                except KeyError:
                    weaponKills[(attackWeapon, victimWeapon)] = (1, distance(AX, AY, VX, VY))
                    
                    ###################################



            except KeyError:
                pass
                
        


        #numRows = numRows + 1
        if numRows >= 200:
            break
    
    numRows = 0
    
    #numRounds = numRounds + 1
    if numRounds >= 10000:
        break



def f(tup):
    (oc, dist) = tup
    return dist/oc

# Before: weaponKills = 'tuple' (<weapon of attacker used to kill>, <weapon of victim who died>) -> (number of times this happened, TOTAL DISTANCE) 
# After: newDict = (attack weapon, victim weapon) -> mean distance    this is of kills
newDict = {k: f(v) for k, v in weaponKills.items()}


doneMatchups = {}

for item in newDict.items():
    (key_, distance_) = item
    (weapon1, weapon2) = key_

    if key_ in doneMatchups or (weapon2, weapon1) in doneMatchups:
        continue 
    else:
        doneMatchups[key_] = 1


    try:
        weapon2WinDistance = newDict[(weapon2, weapon1)]
    except KeyError:
        weapon2WinDistance = -1


    #if weapon2WinDistance != -1:
        #print("Matchup between " + str(weapon1) + " and " + str(weapon2) + "\n")
        #print(str(weapon1) + " wins at " + str(distance_) + "\n" + str(weapon2) + " wins at "+ str(weapon2WinDistance) + "\n\n\n")
    #else:
        #print("Matchup between " + str(weapon1) + " and " + str(weapon2) + "\n")
        #print(str(weapon1) + " wins at " + str(distance_) + "\n" + str(weapon2) + " wins at UNKNOWN"+ "\n\n\n")



#print(rowCount)



# Before: weaponKills = 'tuple' (<weapon type of attacker used to kill>, <weapon type of victim who died>) -> (number of times this happened, TOTAL DISTANCE) 
# After: newDict = (attack weapon, victim weapon) -> mean distance    this is of kills
typeNewDict = {}   # (weapon1, weapon2) -> mapping of distance -> win% of weapon 1

doneDict = {}




# Before: weaponKills = 'tuple' (<weapon of attacker used to kill>, <weapon of victim who died>) -> (number of times this happened, TOTAL DISTANCE) 
# After: newDict = (attack weapon, victim weapon) -> mean distance    this is of kills

#winPercentDict = {k: processWeaponClass(v) for k, v in weaponTypeKills.items()} #(w1,w2) -> (dist -> win pcnt)

FINAL_PERCENTS = {}

matchups = {}

There are hundreds of different weapon matchups in Counter Strike, so we can't possibly show them all here, but we decided to include a few certain interesting matchups:

Matchup between AWP(Sniper) and Tec9(Pistol):

AWP wins at 914.0047596174438

Tec9 wins at 366.67515708765154


Matchup between SawedOff(Shotgun) and USP (Pistol):

SawedOff wins at 281.49619076514665

USP wins at 643.471337353963


Matchup between AK47(Rifle) and M4A4(Rifle)

AK47 wins at 654.5547327705935

M4A4 wins at 665.138594321631


The full list of weapon matchups can be found here.

While the individual matchups are interesting, they are not extremely clear for the purpose of showing trends. We thought it would be more clear to show matchups by weapon types to try and find a generalization for which categories of weapon are effective against others at certain ranges. To demonstrate this, we will plot the different categories against one another, showing the probability of one category defeating the other at different ranges of engagements.

In [141]:
for elt in weaponTypeKills.items():
    (weaponPair, distWinDict) = elt

    (w1,w2) = weaponPair

    if w1 == w2:
        continue

    if (w2, w1) in matchups:
        continue

    matchups[(w1,w2)] = 1

    X = []
    Y = []

    
    #print(distWinDict)
    for elt2 in distWinDict.items():
        #print(elt2)
        (dist_2, wins_2) = elt2

        otherDict = weaponTypeKills[(w2,w1)]
        try:
            wins_3 = otherDict[dist_2]
        except KeyError:
            continue

        #print(str(w1) + " beats " + str(w2) + ' '+ str(float(wins_2)/float(wins_2 + wins_3)) + ' percent at ' + str(dist_2) + ' units\n')

        X.append(dist_2)
        Y.append(float(wins_2)/float(wins_2 + wins_3) * 100.0)
    

    TITLE = "Win Percentage of " + str(w1) + " VS " + str(w2) + ' at Increasing Distance'
    plt.figure(figsize=(9,6))
    plt.xticks(rotation = 80)
    sns.regplot(X,Y).set(title= TITLE, xlabel="Distance", ylabel = "Win Percent of " + str(w1))

These graphs show very clear relations between which weapon category matchups and distance. Nearly every matchup has relatively linear patterns in regards to how the matchup changes at increasing distances. For Sniper vs Pistol, for example, we can see that the chances of a player armed with a sniper beating a player armed with a pistol go from ~20% at close ranges to nearly 100% at the longest ranges. With Rifles vs Snipers, we see the rifle clearly winning close range fights and gradually losing effectiveness until the Sniper is clearly the better option at extreme long ranges. This method of visualization is great for seeing the trends in how these weapon matchups are affected by varying ranges and can be interesting in the context of our previous data, such as the mapping of AWP users on Cache. We can also see that in the case of something like Pistol vs Rifle, the pistol user never has a greater than 50% chance of defeating the rifle user regardless of engagement distance. As pistols are significantly less expensive than rifles, this can help to explain our earlier data indicating that teams who spend more on their equipment have significantly better odds of winning the round. A team that spends \$15k on rifles against a team who only spends \$2k on pistols has very good odds.

We then wanted to plot the frequency of weapons killing at different distances. To do so, we simply counted the amount of weapon kills within different intervals by weapon class.

In [135]:
# weapon -> list of (distance, health) tuples 


#distHealth['<= 300']
weaponList = ['SMG', 'Heavy', 'Pistol', 'Rifle', 'Sniper']

endList = []

# weapon -> list of (distance, health) tuples 
for weapon in weaponList:
    #X = ['0 - 300','300 - 600','600 - 900','900 - 1200','1200 - 1500','1500 - 1800','1800 - 2100','>= 2100']
    #Y = []

    potentialDists = []

    X = []

    distHealth = {}

      
    for tup in weaponDistanceHealth[weapon]:
        (distance__, health__) = tup

        if health__ >= 0:
            X.append(distance__)
        


    
    TITLE = "Frequency of " + str(weapon) + " Kills at Each Distance"


    plt.figure(figsize=(12,6))
    plt.xticks(rotation = 80)
    sns.histplot(X, bins=50).set(title= TITLE, xlabel="Distance", ylabel = "Occurences")

    endList.append(X)

From these results, we see a few things. All weapons have a large number of kills at very close distances, which can be explained by the general gameplay flow of Counter Strike forcing players into close contact regardless of their weapon choice. We then see that certain weapon classes such as pistols and SMGs have much higher frequency of use in close range engagements than mid range, whereas weapon classes like rifles tend to favor midrange. Finally, we can see a noticeable spike in long range kills by snipers.

We found that this data could be more easily interpretable if represented in a different type of plot, so we decided to reformat it into a regression plot. Rather than plotting the number of occurrences, we decided to plot the average remaining health of players who attempt to use these weapon types at various ranges. Obviously, a higher remaining health of the attacker means that the weapon was more effective and allowed them to kill their attacker before being killed themselves. A remaining health value of zero indicates that they died in the engagement at that range.

In [142]:
def roundup(x):
    return int(math.ceil(x / 100.0)) * 100


weaponList = ['SMG', 'Heavy', 'Pistol', 'Rifle', 'Sniper']

for weapon in weaponList:
    X = []
    Y = []

    distHealth = {}

    #distHealth['<= 300']

    for tup in weaponDistanceHealth[weapon]:
        (distance__, health__) = tup
        
        if health__ < 0:
            continue


        roundDist = roundup(distance__)
        
        if distance__ >= 2500:
            continue

        try:
            list_ = distHealth[roundDist]
            list_.append(health__)
                
            distHealth[roundDist] = list_
        except (KeyError, AttributeError):
                list2_ = []
                list2_.append(health__)
                distHealth[roundDist] = list2_
        
    #print(distHealth)

    for tup in distHealth.items():
        (DISTANCE, HEALTH) = tup
        X.append(DISTANCE)
        Y.append(mean(HEALTH))


    TITLE = "Average Health Left after Engagement Using " + str(weapon) + " at Various Distances"
    plt.figure(figsize=(12,6))
    plt.xticks(rotation = 80)
    sns.regplot(X,Y,order=2).set(title= TITLE, xlabel="Distance", ylabel = "Average Health Left")

From this, we can more clearly see trends in the effectiveness of different weapon classes. We can see weapon classes like Heavy and Pistol having steep dropoffs past certain ranges, while weapon classes like Sniper only increase in use with range. In the example of the SMG's plot, we see that past 2000 units of range (~125'), the average remaining health approaches zero, indicating that these players usually died when trying to use SMGs at that range. For snipers, we see the opposite effect, which is consistent with earlier findings.

Conclusion

A number of different conclusions can be drawn from the data presented here.

From a player's perspective, data here can be used to think about one's own gameplay strategies and how they can be improved. For example, one could look at the maps of player positions and use it to consider their own positioning and how it could be improved to counter other players' strategies.

One could also look at the map of common smoke grenade throws compared to a map of player positioning and use it to consider which smoke throws might be the most effective in countering common strategies. One could look at the map of smokes filteered by players above a certain rank and see what strategies are implemented by more experienced players which they could implement into their own gameplay.

A player could look at the analysis of different weapon matchup ranges and use that to determine which weapon to use when playing positions that will result in different ranges of engagement. For example, looking at the AWP vs Tec-9 matchup would show a player that they have better odds of defeating an AWP user with a Tec-9 if they are able to close the distance and engage from a close range. It could also indicate that if they choose to buy an AWP, they'd be well suited to also buy a secondary pistol that has good performance in ranges and matchups that the AWP might not be as effective in.

From a developer's perspective, this data also has a lot to demonstrate in terms of understanding player trends and how different mechanics effect gameplay. This data was collected years ago, and Valve's own analysis since then has resulted in changes to game mechanics. For example, look at the Sawed Off shotgun effective range. It is noticeably low range and ineffective in matchups against weapons that cost less and should be weaker. Valve has since implemented multiple range increases to this weapon that demonstrate how analyzing weapon data in this form can be useful in understanding how the weapon matchups work and what should be changed.

There are also a number of different analyses and conclusions that could be drawn from this dataset. For example, one could look at the prevalency of different maps. You could even check which maps are played most at different times. Maybe Dust 2 is played more often in winter months than in summer months? Maybe players at higher ranks play a map like Cache more often than players at lower ranks? We've only scratched the surface of this dataset's potential. Datapoints that we completely ignored like bomb site choice, defuses, etc all have ample opportunity for analysis. We encourage the readers to take our code and play around with different variables, filters, etc and see what interesting insights they can find. There was a number of different configurations we'd have liked to include yet were unable to due to not wanting to make our writeup unnecessarily long.

In [133]:
#%%shell
#jupyter nbconvert --to html /content/index.ipynb
[NbConvertApp] WARNING | pattern u'/content/index.ipynb' matched no files
This application is used to convert notebook files (*.ipynb) to various other
formats.

WARNING: THE COMMANDLINE INTERFACE MAY CHANGE IN FUTURE RELEASES.

Options
-------

Arguments that take values are actually convenience aliases to full
Configurables, whose aliases are listed on the help line. For more information
on full configurables, see '--help-all'.

--execute
    Execute the notebook prior to export.
--allow-errors
    Continue notebook execution even if one of the cells throws an error and include the error message in the cell output (the default behaviour is to abort conversion). This flag is only relevant if '--execute' was specified, too.
--no-input
    Exclude input cells and output prompts from converted document. 
    This mode is ideal for generating code-free reports.
--stdout
    Write notebook output to stdout instead of files.
--stdin
    read a single notebook file from stdin. Write the resulting notebook with default basename 'notebook.*'
--inplace
    Run nbconvert in place, overwriting the existing notebook (only 
    relevant when converting to notebook format)
-y
    Answer yes to any questions instead of prompting.
--clear-output
    Clear output of current file and save in place, 
    overwriting the existing notebook.
--debug
    set log level to logging.DEBUG (maximize logging output)
--no-prompt
    Exclude input and output prompts from converted document.
--generate-config
    generate default config file
--nbformat=<Enum> (NotebookExporter.nbformat_version)
    Default: 4
    Choices: [1, 2, 3, 4]
    The nbformat version to write. Use this to downgrade notebooks.
--output-dir=<Unicode> (FilesWriter.build_directory)
    Default: ''
    Directory to write output(s) to. Defaults to output to the directory of each
    notebook. To recover previous default behaviour (outputting to the current
    working directory) use . as the flag value.
--writer=<DottedObjectName> (NbConvertApp.writer_class)
    Default: 'FilesWriter'
    Writer class used to write the  results of the conversion
--log-level=<Enum> (Application.log_level)
    Default: 30
    Choices: (0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL')
    Set the log level by value or name.
--reveal-prefix=<Unicode> (SlidesExporter.reveal_url_prefix)
    Default: u''
    The URL prefix for reveal.js (version 3.x). This defaults to the reveal CDN,
    but can be any url pointing to a copy  of reveal.js.
    For speaker notes to work, this must be a relative path to a local  copy of
    reveal.js: e.g., "reveal.js".
    If a relative path is given, it must be a subdirectory of the current
    directory (from which the server is run).
    See the usage documentation
    (https://nbconvert.readthedocs.io/en/latest/usage.html#reveal-js-html-
    slideshow) for more details.
--to=<Unicode> (NbConvertApp.export_format)
    Default: 'html'
    The export format to be used, either one of the built-in formats
    ['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf',
    'python', 'rst', 'script', 'slides'] or a dotted object name that represents
    the import path for an `Exporter` class
--template=<Unicode> (TemplateExporter.template_file)
    Default: u''
    Name of the template file to use
--output=<Unicode> (NbConvertApp.output_base)
    Default: ''
    overwrite base name use for output files. can only be used when converting
    one notebook at a time.
--post=<DottedOrNone> (NbConvertApp.postprocessor_class)
    Default: u''
    PostProcessor class used to write the results of the conversion
--config=<Unicode> (JupyterApp.config_file)
    Default: u''
    Full path of a config file.

To see all available configurables, use `--help-all`

Examples
--------

    The simplest way to use nbconvert is
    
    > jupyter nbconvert mynotebook.ipynb
    
    which will convert mynotebook.ipynb to the default format (probably HTML).
    
    You can specify the export format with `--to`.
    Options include ['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'rst', 'script', 'slides'].
    
    > jupyter nbconvert --to latex mynotebook.ipynb
    
    Both HTML and LaTeX support multiple output templates. LaTeX includes
    'base', 'article' and 'report'.  HTML includes 'basic' and 'full'. You
    can specify the flavor of the format used.
    
    > jupyter nbconvert --to html --template basic mynotebook.ipynb
    
    You can also pipe the output to stdout, rather than a file
    
    > jupyter nbconvert mynotebook.ipynb --stdout
    
    PDF is generated via latex
    
    > jupyter nbconvert mynotebook.ipynb --to pdf
    
    You can get (and serve) a Reveal.js-powered slideshow
    
    > jupyter nbconvert myslides.ipynb --to slides --post serve
    
    Multiple notebooks can be given at the command line in a couple of 
    different ways:
    
    > jupyter nbconvert notebook*.ipynb
    > jupyter nbconvert notebook1.ipynb notebook2.ipynb
    
    or you can specify the notebooks list in a config file, containing::
    
        c.NbConvertApp.notebooks = ["my_notebook.ipynb"]
    
    > jupyter nbconvert --config mycfg.py

---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
<ipython-input-133-40331c2a477b> in <module>()
----> 1 get_ipython().run_cell_magic('shell', '', 'jupyter nbconvert --to html /content/index.ipynb')

/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2115             magic_arg_s = self.var_expand(line, stack_depth)
   2116             with self.builtin_trap:
-> 2117                 result = fn(magic_arg_s, cell)
   2118             return result
   2119 

/usr/local/lib/python3.7/dist-packages/google/colab/_system_commands.py in _shell_cell_magic(args, cmd)
    111   result = _run_command(cmd, clear_streamed_output=False)
    112   if not parsed_args.ignore_errors:
--> 113     result.check_returncode()
    114   return result
    115 

/usr/local/lib/python3.7/dist-packages/google/colab/_system_commands.py in check_returncode(self)
    137     if self.returncode:
    138       raise subprocess.CalledProcessError(
--> 139           returncode=self.returncode, cmd=self.args, output=self.output)
    140 
    141   def _repr_pretty_(self, p, cycle):  # pylint:disable=unused-argument

CalledProcessError: Command 'jupyter nbconvert --to html /content/index.ipynb' returned non-zero exit status 255.